Adaptive control process and system

ABSTRACT

A system for adaptively controlling a wide variety of complex processes, despite changes in process parameters and despite both sudden and systematic drifts in the process, uses response surfaces described by quadratic equations or polynomials of any order. The system estimates the dynamic component of a drifting process or system and thereby identifies the trend of output response variables of the controlled process. Using this information, the system predicts future outputs based on a history of past and present inputs and outputs, thereby recommending the necessary control action or recipe (set of input parameters) to cancel out the drifting trend. A specific embodiment is a system for the adaptive control of photoresist thickness, uniformity, and dispense volume in the spin coating of wafers in integrated circuit manufacturing. Methods used in the adaptive control system are adaptable to control many processes not readily modeled by physical equations.

This application is related to U.S. Provisional Patent Application Ser.No. 60/004,328 filed Sep. 26, 1995, priority of which is claimed, and isrelated to international patent application PCT/US96/15277 filed Sep.25, 1996.

TECHNICAL FIELD

This invention relates generally to automatic control systems. Moreparticularly, it relates to systems and methods for control of processesusing non-linear approximations to generate models of responsevariables.

BACKGROUND ART

In today's industry, it is desirable to have processes that are easilyadaptable to different process specifications and targets while usingthe same production equipment. Moreover, more stringent qualityspecifications make it ever more difficult to use trial-and-errormethods to attain these desired objectives. A control system enablinghigh process adaptability for different product specifications, even inthe face of changing manufacturing conditions, is of special advantage.Such a control system's control strategy should not only allow forchanges in product specifications, but also it should be able to recoverthe process after a maintenance operation or unknown disturbance. Itshould also be able to compensate for slow drifts and sudden shifts in aprocess and should have the ability to incorporate both economic andquality specifications in its objective criterion.

A common practice in industry today is to dedicate equipment to specificprocesses and produce the same product using the same recipe. However,with the incidence of cluster tools that are programmable to do varioustasks (especially in the semiconductor industry), the need forflexibility and adaptability is widely felt. For instance, a change in abatch of chemical used in a process or a change in ambient temperaturecould result in manufacturing products not within specifications if therecipe is not changed appropriately.

At present, the process control methods practiced in industry arestatistical process control (SPC) and automatic feedback control. Thesetwo methods are used virtually independent of one another and each isunable to address all the control issues mentioned above. The concernsdiscussed above have necessitated the development of this invention.

The application of adaptive control techniques in the aerospace industryis widespread and has been applied to autopilot design and othernavigation equipment. Unfortunately, this has not been the case in manyother industries. Even though the growth of the semiconductor industryhas brought along with it many advances in science and technology, andespecially advances in the computer industry, there has been relativelylittle process automation in the semiconductor fabrication industry.Thus, although one may see robots working in an assembly line, yet oftenmost of the processes are running open loop. Such processes can continueto produce products until a large number of products that are out ofspecifications are manufactured and tested before an SPC system canissue a warning.

At present, some of the most popular process control methods practicedin industry are statistical process control (SPC), factorial design, andautomatic feedback control. In SPC, a history of product statistics ismeasured and plotted together with the limits for acceptable products.Then when products consistently go outside of this range it signifiesthat the process has changed and action needs to be taken to rectifythis process change. In factorial design, combinations of experimentsare conducted off-line to determine the interdependencies of thevariables and to generate linear regression models for furtheranalytical work. Automatic feedback control is a well known field thatseeks to control the outputs of system to track a reference input orsuch similar objective. This field has been well formalized for linearsystems but the application of feedback theory to nonlinear processesand systems is quite a challenge. It is not surprising therefore thatthere have not been many advances in the area of process automationwhereby processes are run in a closed loop without operatorintervention. Since most manufacturing lines use machines, rawmaterials, and process chemicals, the modeling of the entire process isoften intractable. Not only are typical real manufacturing processesvery complex, but also they typically are nonlinear and not amenable toexact closed form solutions. Hence, methods such as automatic feedbackcontrol which rely on analytical mathematical models derived fromdifferential equations of the process are limited. Even though factorialdesign can provide information about the variable interdependencies, yetits operation in a closed loop without the involvement of a humanoperator has not been attained heretofore.

Methods using ULTRAMAX software for control optimization have beendescribed by C. W. Moreno, in the article “Self-Learning OptimizationControl Software,” (Instrument Society of America Proceedings, ResearchTriangle Park, North Carolina, June 1986) and C. W. Moreno and S. P.Yunker in the article “ULTRAMAX: Continuous Process Improvement ThroughSequential Optimization” (Electric Power Research Institute, Palo Alto,Calif., 1992). Other related publications are the article by C. W.Moreno “A Performance Approach to Attribute Sampling and Multiple ActionDecisions” (AliE Transactions. September 1979, pp. 183-197) and C. W.Moreno, “Statistical Progress Optimization” (P-Q System AnnualConference, Dayton, Ohio, Aug. 19-21, 1987, pp. 1-14). E. Sachs, A. Hu,and A. Ingolfsson, in an article entitled “Run by Run Process Control:Combining SPC and Feedback Control” (IEEE Transactions on SemiconductorManufacturing, October 1991) discussed an application combining feedbackand statistical process control, which used parallel design ofexperiments (PDOE) techniques in combination with linear run-by-runcontrollers.

U.S. Pat. No. 3,638,089 to Gabor discloses a speed control system for amagnetic disk drive having high- and low-level speed means. A feedbackcontrol loop compares index marks from a disk unit in conjunction with acounter unit driven by an oscillator to provide a reference level todrive a DC drive motor between a high-level speed above its normalspeed, and a low-level speed below its normal speed. An open-loop systemalso provides high-level and normal speeds. The open-loop systemincludes a voltage-controlled oscillator (VCO), an amplifier, and an ACdrive motor. U.S. Pat. No. 5,412,519 to Buettner et al. discloses a diskstorage device which optimizes disk drive spindle speed during low powermode. This system optimizes power savings to the characteristics of theparticular drive. A transition speed is recalibrated periodically, andadaptive control can be implemented in this system by altering the timebetween recalibration cycles, extending the time if little or no changehas occurred, or shortening the time when a sample sequence indicateschanging status or conditions.

U.S. Pat. No. 5,067,096 to Olson et al. discloses a target engagementsystem for determining proximity to a target. This system uses targetmotion analysis to determine a target engagement decision for groundtargets such as vehicles. The input to the engagement system is thetarget azimuth as a function of time. The target is estimated to bewithin range or out-of-range based on calculation of a ratio of timeintervals of crossing specified target azimuth sectors.

U.S. Pat. No. 5,144,595 to Graham et al. discloses an adaptivestatistical filter for target motion analysis noise discrimination. Theadaptive statistical filter includes a bank of Kalman filters, asequential comparator module, and an optimum model order and parameterestimate module.

U.S. Pat. No. 5,369,599 to Sadjadi et al. discloses a signal metricestimator for an automatic target recognition (ATR) system. Aperformance model in the form of a quadratic equation is partiallydifferentiated with respect to a parameter of the ATR, and the partialdifferentiation allows solution for an estimated metric.

U.S. Pat. No. 5,513,098 to Spall et al. discloses a method of developinga controller for general (nonlinear) discrete-time systems, where theequations governing the system are unknown where a controller isestimated without building or assuming a model for the system. Thecontroller is constructed through the use of a function approximator(FA) such as a neural network or polynomial. This involves theestimation of the unknown parameters within the FA through the use of astochastic approximation that is based on a simultaneous perturbationgradient approximation.

Thus, a variety of methods for automatic control and especially forautomatic target recognition, and systems using the methods have beendeveloped for specific purposes, some of which do not depend on analyticmathematical models such as differential equations. Some of the methodsused in the background art cannot deal with fast-drifting systems, andsome rely on small perturbations of the input variables, so that aresulting goal function must lie within a limited range around thedesired trajectory.

DISCLOSURE OF THE INVENTION

This invention provides features that enhance existing methods andprovides new methods, resulting in a novel system that is extremelyflexible and versatile in its applications. The methods of thisinvention can be applied, not only to systems and processes which can bemodeled by differential equations, but also to processes which aredescribed by quadratic or higher-order polynomial models. This inventionis described in a Ph.D. dissertation entitled “Adaptive Control ofPhotoresist Thickness, Uniformity, and Dispense Volume in the SpinCoating of Wafers,” submitted by the present inventor on Sep. 27, 1995to the University of Vermont, the entire disclosure of whichdissertation is incorporated herein by reference. This dissertation isavailable to the public at the Research Annex, Bailey Howe Library,University of Vermont, Burlington, Vt.

Nomenclature

The term “recipe” as used in this specification, means a vector orordered set of input variables for a process to be controlled.

In most manufacturing operations, a tremendous amount of information onprocesses can be recorded and stored as historical information indatabases. The approach of this invention takes advantage of suchhistorical information. This is done by feeding historical data into arun-by-run sequential design of experiment (RBR SDOE) optimizationroutine, continuing with the optimization process and finallyidentifying the optimum operating point. Then models (linear ornonlinear) of the response variables, in terms of the input variables(recipe), can be generated at the optimum operating point. This RBR SDOEapproach allows for the definition of multiple objective functions suchas performance loss functions and hence allows the optimization (e.g.minimization) of a suitable performance measure while meetingconstraints for input and response variables. Once these local modelsare generated, the nonlinear adaptive controller is initialized usingthe models. The approach used in this invention is rigorous, and itaddresses the fundamental issue of nonlinearity in the uniformitysurface response. The effects of uncontrolled variables, of variableinteractions, and of second-order terms on the performance measure canbe better accounted for using quadratic models. In general, linearmodels are often a sufficient approximation to the true behavior of thesystem far from the optimum but they are not very good for describingresponse surfaces in the region of the optimum. This is because theregion of the optimum usually shows curvature that cannot be explainedby linear relationships. Curvature is always accounted for by higherorder terms. Furthermore, when interaction is present in multi-factorsystems, linear models cannot adequately describe the “twisted plane”that results from the interaction. The controller of this invention isable to account for all these factors since it is a nonlinearcontroller.

Purposes, Objects, and Advantages

A major purpose of the invention is to provide an adaptive controlsystem capable of automatically controlling a wide variety of complexprocesses despite changes in process parameters and despite drifts inthe controlled process. A related purpose is to provide methods by whichsuch a system can be implemented.

Thus an important object of the invention is a system for controlling amulti-variable process that cannot be readily modeled by physicalequations. Another important object is a process control system that candetect the incidence of manufacturing problems after only a few productsare manufactured. Another object is a process control system that canpredict the possibility of manufacturing off-specification products. Arelated object is a system that can sound an alarm beforeoff-specification products are manufactured, so as to avoid waste. Theseand other purposes, object, and advantages will become clear from areading of this specification and the accompanying drawings.

Understanding of the present invention will be facilitated byconsideration of the following detailed description of a preferredembodiment of the present invention, taken in conjunction with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagrammatic view of a control system made in accordancewith the present invention.

FIG. 2 is a flow chart illustrating a process performed in accordancewith the present invention.

FIG. 3A is a graph showing mean photoresist thickness in nanometers vs.run number data in an optimization phase of an example process.

FIG. 3B is a graph showing standard deviation of photoresist thicknessvs. run number data in an optimization phase of an example processcorresponding to FIG. 3A.

FIG. 3C is a graph showing photoresist dispense volume in millilitersvs. run number data in the example process of FIGS. 3A and 3B.

FIG. 4a is a three-dimensional response-surface plot of photoresistthickness in nanometers as dependent variable vs. two independentvariables: photoresist dispense speed in kilo-revolutions per minute(Krpm) and photoresist spread speed in kilo-revolutions per minute(Krpm) in example process.

FIG. 4b is a response-surface contour plot of photoresist thickness innanometers as dependent variable vs. two independent variables:photoresist dispense speed in kilo-revolutions per minute (Krpm) andphotoresist spread speed in kilo-revolutions per minute (Krpm),correspond to FIG. 4a.

FIG. 5a is a three-dimensional response-surface plot of standarddeviation of photoresist thickness in nanometers as dependent variablevs. two independent variables: photoresist dispense speed inkilo-revolutions per minute (Krpm) and photoresist spread speed inkilo-revolutions per minute (Krpm) in the example process of FIGS. 4aand 4 b.

FIG. 5b is a response-surface contour plot of standard deviation(one-sigma) of photoresist thickness in nanometers as dependent variablevs. two independent variables: photoresist dispense speed inkilo-revolutions per minute (Krpm) and photoresist spread speed inkilo-revolutions per minute (Krpm), corresponding to FIG. 5a.

FIG. 6a is a graph showing real process output (o) and adaptive control(least-squares open-loop) simulated output (x) of mean photoresistthickness in nanometers as dependent variable vs. run number asindependent variable for an example process.

FIG. 6b is a graph showing real process output (+) and adaptive control(least-squares open-loop) simulated output (o) of standard deviation(one-sigma) of photoresist thickness in nanometers dependent variablevs. run number as independent variable for an example process.

FIG. 7a is a graph showing real process output (+) and adaptive control(closed-loop) simulated outpot (o) of mean photoresist thickness innanometers as dependent variable vs. run number as independent variablefor an example process.

FIG. 7b is a graph showing real process output (+) and adaptive control(closed-loop) simulated output (o) of standard deviation (one-sigma) ofphotoresist thickness in nanometers as dependent variable vs. run numberas independent variable for an example process.

MODES FOR CARRYING OUT THE INVENTION

A general description of the adaptive control system in this inventionis illustrated in FIG. 1 where the control system strategy is applied tothe photoresist-coating of wafers in semiconductor integrated circuit(IC) manufacturing. The object here is to use the adaptive controlsystem to obtain a photoresist film of specified mean thickness with thebest possible uniformity by appropriately choosing the input variablesor recipes to achieve this in the face of changes in process parameters.The process has about fourteen potentially important input variables andthree output variables. The output variables are the cross-wafer meanthickness, the cross-wafer standard deviation or uniformity, andphotoresist dispense volume. The dispense volume is actually an inputvariable which is also required to be minimized. In FIG. 1, T_(m)(a) andS_(m)(a) denote the predicted thickness and standard deviationrespectively while T_(d) and S_(d) respectively represent the desiredthickness and standard deviation.

As a first step to the application of this adaptive control system, anominal empirical model of the system is obtained. This can be done byoptimizing a compound performance loss function using sequential designof experiment techniques. This step can be replaced with parallel designof experiment techniques or by using physical models of the process. Theoptimization process will eventually identify the key input variablesand the variables that have little or no influence on the responsevariables can be considered as constants. The resulting model in termsof the key variables may be of the form $\begin{matrix}\begin{matrix}\begin{matrix}{{F(x)} = \quad {C + {\sum\limits_{i = 1}^{i = N}\left\lbrack {{G(i)}*{{dX}(i)}} \right\rbrack} +}} \\{\quad {\sum\limits_{i = 1}^{i = N}{\sum\limits_{j = 1}^{j = N}\quad \left\lbrack {{H\left( {i,j} \right)}*{{dX}(i)}*{{dX}(j)}} \right\rbrack}}}\end{matrix} \\{{where}\quad,} \\\begin{matrix}N & = & \text{Number of~~input~~variables} \\C & = & \text{Constant~~term} \\R & = & \text{Reference~~vector} \\{dX} & = & {X - {R\left( {{{{If}\quad R} = 0},\quad {{{{then}\quad {dX}} = X};{X\quad {is}\quad a\quad {vector}}}} \right.}} \\\quad & \quad & \text{of~~input~~variables - the~~recipe)} \\G & \quad & \text{is~~the~~gradient~~vector} \\H & \quad & \text{is~~the~~Jacobian~~matrix}\end{matrix}\end{matrix} & {{Equation}\quad (1)}\end{matrix}$

The series of steps in implementing this control system strategy isillustrated using a 2-variable (x₁, and x₂) process as an example. Thequadratic model of a 2-variable process can be represented as

Y=a ₀ +a ₁ x ₁ +a ₂ x ₂ +a ₁₁ x ₁ ² +a ₂₂ x ₂ ² +a ₁₂ x ₁₂  Equation (2)

Which in vector form will be

y=φ ^(T)θ₀  Equation (3)

where,

φ^(T)=[1 x₁ x₂ x₁ ² x₂ ² x₁₂]

φ₀ ^(T)=[a₀ a₁ a₂ a₁₁ a₂₂ a₁₂]

Unlike future outputs of autoregressive moving average models, which canbe expressed by recursive relations in terms of past outputs and pastand present inputs, the resulting models given by Equations (1) and (2)are static, and hence cannot capture the dynamic behavior of a process.Hence, they lack the ability to be used in predicting the output of theprocess in a time-series sense. The calculation of the error term

e(t)=y(t)−ŷ(t)  Equation (4)

in the least squares estimation process requires knowledge of thepredicted output

ŷ(t)=φ(t−1)^(T){circumflex over (φ)}(t−1)  Equation (5)

This information is not available in models obtained by experimentaldesign techniques. Hence, the need for a method of predicting futureoutputs is mandatory. In this invention polynomial extrapolation isapplied to available historical data, in predicting the mean thicknessand standard deviation for the next run. This then replaces what can beseen as the predictive part of autoregressive moving average models.What this suggests is that parameter estimation and prediction should bedone somewhat independently with prediction following estimation in thesame loop. Thus, two least squares estimators were used in implementingthis control system strategy. The first least squares estimator is usedin modeling the process behavior, up to the current run, by choosingparameters of the quadratic model such that the error between the actualresponses measured and the model parameters are minimized in a leastsquares sense. Polynomial extrapolation is then used, in conjunctionwith historical data, to obtain the thickness and standard deviation ofthe next run. Then these extrapolated response values are used, as ifthey are the real outputs, to update the parameters of the secondestimator. Thus, the resulting parameters of the second estimationprocess can be used to predict the responses of the drifting process ifthe recipe is known. If we were to continue to use the current recipe,the process would continue to drift. Given this current process drift ascaptured by the parameters of the second estimation process, thisinvention provides a way to find the correct recipe to apply in orderthat we cancel out the drifting trend and simultaneously satisfy all ourresponse targets.

As an illustrative example, this invention was applied to thespin-coating of wafers in IC manufacturing. The problem statement isthat a system is desired that would model the process of depositing aspecified thickness of photoresist thin film (a 1000 nanometers (1 μm)film was studied in this example case), with the best possibleuniformity using the least quantity of photoresist chemical. To applythis invention to the spin-coating process, the latter was exercisedthrough a number of run/advice cycles of an optimization routine. Theresults are summarized as shown in FIG. 3. The figure clearly shows thatthe photoresist dispense volume was dramatically reduced from about 8milliliters to about 4.3 milliliters for a 1000 nm film. This amounts toalmost 50% reduction in chemical usage and translates to millions ofdollars of savings for a typical modern semiconductor fabricatingfacility. It can be seen that the standard deviation reduced steadilyrun-by-run. The cross-wafer mean thickness, however, remained fairlyconstant at 1000 nm. The photoresist dispense volume is an inputvariable but it is also required to be minimized and so it is alsodefined as a calculated output variable. For physical constraint reasonsthe dispense volume could not be reduced below 4.3 milliliters, withoutaffecting the quality of the films. So, the optimum value of the volumewas set at 4.3 milliliters and assumed as a constant thereafter. Hence,hereafter only the cross-wafer mean thickness and standard deviation areconsidered as output variables to be optimized.

Models are generated from the optimization process above and severalinformation such as the key variables and percentage contribution of theinput variables to the response variables can be deduced from themodels. 3D surface response plots can be generated from the models togive a pictorial view of the variable dependencies etc. FIGS. 4a, 4 b, 5a, and 5 b show 3D and contour plots of the film mean thickness andstandard deviation with two independent variables, dispense speed andspread speed. Note that even though the optimization process drove theprocess to the neighborhood of the desired targets as shown in FIGS. 3A,3B, and 3C it still exhibited large run-by-run variability particularlyin the mean thickness and standard deviation. Thus, to reduce therun-by-run variability, the models generated from the optimization phaseare used in initializing a novel adaptive controller.

The chronological sequence of events involved in engaging the noveladaptive controller from the start of the process is as follows. Afterthe first wafer is spin-coated with photoresist, a cross-sectional meanthickness measurement is obtained using a measurement tool. Thereafterthe real mean thickness {overscore (T)}_(r) and sample standarddeviation {overscore (s)}_(r) are computed. Then, initial parametervalues of the first estimator are chosen using the model coefficientsdetermined from the optimization phase and these parameters are updatedwith the new process data as follows. Given the current recipe x, andthe current parameters a and b, (x, a, and b being vectors), the modelthickness and standard deviation can be calculated from Equation (1) as{overscore (T)}_(m)(a, x) and {overscore (s)}_(m)(b,x) respectively. Anew set of parameters is chosen by minimizing the error between themodel response and the real response. That is, the problem definitionis:

min[{overscore (T)} _(r) −{overscore (T)} _(m)(a, x)]²  Equation (6)

subject to a

min[{overscore (s)} _(r) −{overscore (s)} _(m)(b, x)]²  Equation (7)

subject to b

where

T_(m)(a, x)=a₀+a₁x₁+a₂x₂+a₁₁x₁ ²+a₁₂x₁₂

s_(m)(b, x)=b₀+b₁x₁+b₂x₂+b₁₁x₁ ²+b₁₂x₁₂

The actual iterative equations for implementing the least squaresalgorithm of Equations (6) and (7) are given below in Equations (8) and(9) as $\begin{matrix}\begin{matrix}{{\hat{\theta}(t)} = \quad {{\hat{\theta}\left( {t - 1} \right)} + \frac{{P\left( {t - 2} \right)}{\varphi \left( {t - 1} \right)}}{1 + {{\varphi \left( {t - 2} \right)}^{T}{P\left( {t - 2} \right)}{\varphi \left( {t - 1} \right)}}}}} \\{\quad \left\lbrack {{y(t)} - {{\varphi \left( {t - 1} \right)}^{T}{\hat{\theta}\left( {t - 1} \right)}}} \right\rbrack}\end{matrix} & {{Equation}\quad (8)} \\\begin{matrix}{{P\left( {t - 1} \right)} = {{P\left( {t - 2} \right)} - \frac{{P\left( {t - 2} \right)}{\varphi \left( {t - 1} \right)}{\varphi \left( {t - 1} \right)}^{T}{P\left( {t - 2} \right)}}{1 + {{\varphi \left( {t - 1} \right)}^{T}{P\left( {t - 2} \right)}{\varphi \left( {t - 1} \right)}}}}} \\{t \geq 1}\end{matrix} & {{Equation}\quad (9)}\end{matrix}$

with given initial estimate,

{circumflex over (θ)}(0), P(−1)=kP

where k is a large constant and P is any positive definite matrix,typically the identity matrix I; {circumflex over (θ)}(t), and φ(t) havealready been defined in Equation (3).

To verify that this procedure actually succeeds in tracking the responsevariables, the least squares process was applied to a set of historicaldata and the result is presented in FIGS. 6a and 6 b. From the graph itcan be seen that the model developed tracks the response variables verywell. However, since the response variables (thickness and standarddeviation) are available after the fact, we expect that the estimatorresults will lag behind the current process by one run and thisexpectation is actually confirmed in the plots. This again affirms theneed for the incorporation of a method of prediction in order that theestimator model output predictions will synchronize with the systemoutputs.

So, after two wafers are processed and the first estimator parametersare updated, sequentially after each data point is available usingEquations (8) and (9), we obtain the model equations T_(m2), s_(m2) asgiven by Equation (6) and (7). Then, the first two data points are usedas starting points in doing polynomial extrapolation to predict thevalue of the outputs for the third run; thereafter, the second estimatoris engaged.

Let the extrapolated mean thickness and standard deviation for the thirdrun be {overscore (T)}_(e3) and {overscore (s)}_(e3) respectively. Thenusing parameters a,b estimated by the previous estimator, as the initialparameter guess for the second estimator, the optimization problembecomes:

min[{overscore (T)} _(e3) −{overscore (T)} _(m2)(a)]²  Equation (10)

subject to a;

min[{overscore (s)} _(e3) −{overscore (s)} _(m2)(b)]²  Equation (11)

subject to b.

These new parameters a,b pertain to the state of the process one stepahead assuming that the same recipe x is used. From these new parametervalues, we can compute the new predicted thickness {overscore (T)}_(p3)and standard deviation {overscore (s)}_(p3) using Equation (1). Thus, ifthe process is drifting we could still predict the mean thickness andstandard deviation one step ahead, given that we maintain the recipe asthe previous run value. If our desired thickness and standard deviationtargets are respectively, {overscore (T)}_(d), {overscore (s)}_(d) thenwe can back compute to find the recipe that ought to be used in the nextrun to prevent the process from drifting. Hence, the problem becomesthat of choosing a recipe x such that the targets, {overscore (T)}_(d),{overscore (s)}_(d) are simultaneously met. Thus, we want

{overscore (T)}(x)_(p3) ={overscore (T)} _(d) →{overscore (T)}(x)_(p3)−{overscore (T)} _(d)=0  Equation (12)

{overscore (s)}(x)_(p3) ={overscore (s)} _(d) →{overscore (s)}(x)_(p3)−{overscore (s)} _(d)=0  Equation (13)

Thus, the recipe to use on the next run to get the mean thickness andstandard deviation to target is found by simultaneously solving the twononlinear Equations (12) and (13). This procedure is repeated till thesystem converges to the targeted values.

A summary of the series of steps for the implementation of the aboveadaptive control system is shown in FIG. 2 and outlined below:

1. Initialize the adaptive controller by choosing appropriate initialparameter values. Using these parameters and the nominal recipe valuesx, compute the model predicted thickness {overscore (T)}_(m1) and samplestandard deviation, {overscore (S)}_(m1) using Equation (1).

2. Process first wafer and compute the real mean thickness {overscore(T)}_(r1) and sample standard deviation {overscore (s)}_(s1),

3. Compute the resulting error between the model prediction in 1 and thereal process results in 2. Using this error, in conjunction with theleast squares process, update the adaptive controller parameters asshown in Equations (6) and (7).

4. Process next wafer and compute {overscore (T)}_(r2) and {overscore(s)}_(r2).

5. Update the model parameters again using the last processed wafer.

6. Using available real process data, do polynomial extrapolation todetermine the predicted thickness {overscore (T)}_(e3) and predictedstandard deviation {overscore (s)}_(e3).

7. Update the adaptive controller parameters and compute the new modelpredicted thickness {overscore (T)}_(p2) and standard deviation{overscore (s)}_(p2). This will be the state of the process in the nextrun if the previous recipe is used.

8. Determine the optimum recipe x, to use to drive the process to thedesired targets {overscore (T)}_(d) and {overscore (s)}_(d) bysimultaneously solving the two Equations (12) and (13).

9. Using the recipe obtained in 8 as the current recipe, go to 4 andloop till a stopping condition is met or till the process is stableenough to allow disengagement of the adaptive controller, if necessary.

The results obtained by applying this procedure to our test process issummarized in FIGS. 7a and 7 b. It can be seen that even though thecontroller was initialized with parameters that resulted in largeinitial errors, the controller still converged to the optimum point in afew steps. This fact was confirmed by running a number of experimentsand all the results were in close agreement. This shows that theoperation of the adaptive controller is not influenced very much by theinitial choice of the parameters and that it tends to drive the systemtowards the optimum operating point. Indeed, for the least squaresestimation process, parameters converge after n runs, where n is thenumber of parameters to be estimated. In this experiment, thecalculation of the rate of convergence is complicated by the fact thatwe are dealing with two adaptive controllers running in parallel: onefor the thickness and the other for the standard deviation. Moreover,the update of the parameters is done twice in a loop. In spite of thiscomplication, we can see that the controllers are well bounded andconverge. It is interesting to note that as more weight is assigned tothe thickness performance measure, the adaptive controller for thicknessconverges to the target thickness of 1000 nm. However, the weight placedon standard deviation is decreased accordingly and so it was difficultto maintain it at 1.5 nm. After different weights were assigned to thethickness and standard deviation criteria, it became clear that the toolwas incapable of delivering more stringent process requirement. At runnumber 8 (wafer no. 8), the thickness obtained was approximately 1000 nmwith a standard deviation of about 2.3 nm. After run 9 (wafer no. 9),the system had settled to the final steady-state value and the standarddeviation had started leveling out. It was not possible to get more thanten runs in an uninterrupted sequence with the setup available for theillustrated example.

From the models developed, the sensitivity of the process to changes inthe input variables was studied and the key variables were identified.From the sensitivity studies it was clear that a tool that could deliverproducts with tight tolerances will require that at least some, if notall, of the key process variables have good resolutions and tolerances.This information may in turn serve as a good input to specifying thetolerances and resolutions of components and devices to use in buildingequipment. For example, the speed resolution and regulation of electricmotors to be used in building electrical equipment, and the resolutionand accuracy of sensors to use in designing electrical tools, etc. canbe specified.

The functional elements of the process and system of FIGS. 1 and 2 maybe discrete components or modules of a software program run on a knowncomputer. Alternatively, they may be discrete electrical or electroniccomponents capable of performing the functions described herein. It isbelieved that one of the ordinary skill in the art having the abovedisclosure before him could produce these components without undueexperimentation.

INDUSTRIAL APPLICABILITY

This invention provides an automated means of optimizing a process andminimizing the run-by-run variability. A novel adaptive controller isused to estimate the characteristics of a process. Polynomialextrapolation is used in predicting future outputs and in conjunctionwith a second adaptive estimator, a model representing the driftingprocess can be obtained. Based on this model the correct recipe to useto cancel out the drifting trend can then be computed and applied toprevent the process from drifting.

While the invention has been particularly shown and described withreference to a preferred embodiment thereof, it will be understood bythose skilled in the art that various other changes in the form anddetails may be made therein without departing from the spirit and scopeof the invention.

What is claimed is:
 1. A method for adaptively controlling a processoperating on a sequence of samples according to a recipe having recipevalues, said method comprising the steps of: (a) initializing anadaptive controller by setting initial parameter values and settingnominal recipe values, said adaptive controller being described by afirst parameter T_(r), a standard deviation of said first parameters_(r), an extrapolated value T_(e3) of said first parameter, anextrapolated value of standard deviation s_(e3) of said first parameter,a model value T_(m2) (a) of said first parameter, a model value ofstandard deviation {overscore (s)}_(m2) (b) of said first parameter, amean value {overscore (T)}(x)_(p3) of said first parameter, a mean valueof standard deviation {overscore (s)}(x)_(p3) of said first parameter, adesired mean value {overscore (T)}_(d) of said first parameter, and adesired mean value of standard deviation {overscore (s)}_(d) of saidfirst parameter; (b) computing a model-predicted parameter value usingfirst equation $\begin{matrix}\begin{matrix}\begin{matrix}{{F(x)} = \quad {C + {\sum\limits_{i = 1}^{i = N}\left\lbrack {{G(i)}*{{dX}(i)}} \right\rbrack} +}} \\{\quad {\sum\limits_{i = 1}^{i = N}{\sum\limits_{j = 1}^{j = N}\quad \left\lbrack {{H\left( {i,j} \right)}*{{dX}(i)}*{{dX}(j)}} \right\rbrack}}}\end{matrix} \\{{where}\quad,} \\\begin{matrix}N & = & \text{Number of~~input~~variables} \\C & = & \text{Constant~~term} \\R & = & \text{Reference~~vector} \\{dX} & = & {X - {R\left( {{{{If}\quad R} = 0},\quad {{{{then}\quad {dX}} = X};{X\quad {is}\quad a\quad {vector}}}} \right.}} \\\quad & \quad & \text{of~~input~~variables)} \\G & \quad & \text{is~~the~~gradient~~vector} \\H & \quad & \text{is~~the~~Jacobian~~matrix}\end{matrix}\end{matrix} & {{Equation}\quad (1)}\end{matrix}$

(c) processing a first sample of said sequence, measuring a firstparameter of said first sample multiple times to obtain a mean value andsample standard deviation for said first sample; (d) computing theresulting error of said first parameter and updating the parameters ofsaid first sample by using equations min[{overscore (T)} _(r)−{overscore (T)} _(m)(a, x)]²  Equation (6) subject to a andmin[{overscore (s)} _(r) −{overscore (s)} _(m)(a, x)]²  Equation (7)subject to b where T_(m)(a,x)=a₀+a₁x₁+a₂x₂+a₁₁x₁ ²+a₁₂ x₁₂s_(m)(b,x)=b₀+b₁x₁+b₂x₂+b₁₁x₁ ²+b₁₂ x₁₂ (e) processing at least a secondsample and extrapolating to find the value of the next sample point andusing it, as if it is the true sample, to update adaptive controllerparameters by using equations  min[{overscore (T)} _(e3) −{overscore(T)} _(m2)(a)]²  Equation (10) subject to a; and min[{overscore (s)}_(e3) −{overscore (s)} _(m2)(b)]²  Equation (11) subject to b, (f)computing the optimum recipe to use on the next run by using thesimultaneous nonlinear equations {overscore (T)}(x)_(p3) ={overscore(T)} _(d) →{overscore (T)}(x)_(p3) −{overscore (T)} _(d)=0  Equation(12) and {overscore (s)}(x)_(p3) ={overscore (s)} _(d) →{overscore(s)}(x)_(p3) −{overscore (s)} _(d)=0  Equation (13) (g) repeating theabove steps (b) through (f) until a stopping condition is met.
 2. Themethod for adaptively controlling a process of claim 1, wherein saidstopping condition includes a minimum value for time dependence of atleast one of said parameter values.
 3. An automatic control system forcontrolling a process described by physical equations or empiricalmodels, said process having input variables and having responsevariables to be controlled, said automatic control system comprising acomputer of known type, programmed with instructions to perform thesteps of: (a) initializing an adaptive controller by setting initialparameter values and setting nominal recipe values, said adaptivecontroller being described by a first parameter T_(r), a standarddeviation of said first parameter s_(r), an extrapolated value T_(e3) ofsaid first parameter, an extrapolated value of standard deviation s_(e3)of said first parameter, a model value T_(m2) (a) of said firstparameter, a model value of standard deviation s_(m2) (b) of said firstparameter, a mean value {overscore (T)}(x)_(p3) of said first parameter,a mean value of standard deviation {overscore (s)}(X)_(p3) of said firstparameter, a desired mean value {overscore (T)}_(d) of said firstparameter, and a desired mean value of standard deviation {overscore(s)}_(d) of said first parameter; (b) computing a model-predictedparameter value using first equation $\begin{matrix}\begin{matrix}\begin{matrix}{{F(x)} = \quad {C + {\sum\limits_{i = 1}^{i = N}\left\lbrack {{G(i)}*{{dX}(i)}} \right\rbrack} +}} \\{\quad {\sum\limits_{i = 1}^{i = N}{\sum\limits_{j = 1}^{j = N}\quad \left\lbrack {{H\left( {i,j} \right)}*{{dX}(i)}*{{dX}(j)}} \right\rbrack}}}\end{matrix} \\{{where}\quad,} \\\begin{matrix}N & = & \text{Number of~~input~~variables} \\C & = & \text{Constant~~term} \\R & = & \text{Reference~~vector} \\{dX} & = & {X - {R\left( {{{{If}\quad R} = 0},\quad {{{{then}\quad {dX}} = X};{X\quad {is}\quad a\quad {vector}}}} \right.}} \\\quad & \quad & \text{of~~input~~variables)} \\G & \quad & \text{is~~the~~gradient~~vector} \\H & \quad & \text{is~~the~~Jacobian~~matrix}\end{matrix}\end{matrix} & {{Equation}\quad (1)}\end{matrix}$

(c) processing a first sample of said sequence, measuring a firstparameter of said first sample multiple times to obtain a mean value andsample standard deviation for said first parameter of said first sample;(d) computing the resulting error of said first parameter and updatingthe parameters of said first sample using equations min[{overscore (T)}_(r) −{overscore (T)} _(m)(a,x)]²  Equation (6) subject to a andmin[{overscore (s)} _(r) −{overscore (s)} _(m)(a,x)]²  Equation (7)subject to b where T_(m)(a,x)=a₀+a₁x₁+a₂x₂+a₁₁x₁ ²+a₁₂x₁₂s_(m)(b,x)=b₀+b₁x₁+b₂x₂+b₁₁x₁ ²+b₁₂x₁₂ (e) processing at least a secondsample and extrapolating to find the value of the next sample point andusing it, as if it is the true sample, to update adaptive controllerparameters using equations min[{overscore (T)} _(e3) −{overscore (T)}_(m2)(a)]²  Equation (10) subject to a; and min[{overscore (s)} _(e3)−{overscore (s)} _(m2)(b)]²  Equation (11) subject to b, (f) computingoptimum recipe values to use on the next run using the simultaneousnonlinear equations {overscore (T)}(x)_(p3) ={overscore (T)} _(d)→{overscore (T)}(x)_(p3) −{overscore (T)} _(d)=0  Equation (12) and{overscore (s)}(x)_(p3) ={overscore (s)} _(d) →{overscore (s)}(x)_(p3)−{overscore (s)} _(d)=0  Equation (13) and substituting said optimumvalues into said first equation; and (g) repeating the above steps (b)through (f) until a stopping condition is met.
 4. The automatic controlsystem of claim 3, wherein said stopping condition includes a minimumvalue for time dependence of at least one of said parameter values.